Data transfer techniques within data storage devices, such as network attached storage performing data migration

ABSTRACT

A stand-alone, network accessible data storage device, such as a filer or NAS device, is capable of transferring data objects based on portions of the data objects. The device transfers portions of files, folders, and other data objects from a data store within the device to external secondary storage based on certain criteria, such as time-based criteria, age-based criteria, and so on. A portion may be one or more blocks of a data object, or one or more chunks of a data object, or other segments that combine to form or store a data object. For example, the device identifies one or more blocks of a data object that satisfy a certain criteria, and migrates the identified blocks to external storage, thereby freeing up storage space within the device. The device may determine that a certain number of blocks of a file have not been modified or called by a file system in a certain time period, and migrate these blocks to secondary storage.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Patent Application No.61/097,176 filed Sep. 15, 2008 (entitled DATA TRANSFER TECHNIQUES WITHINDATA STORAGE DEVICES, SUCH AS NETWORK ATTACHED STORAGE PERFORMING DATAMIGRATION, Attorney Docket No. 60692-8066.US00), the entirety of whichis incorporated by reference herein.

BACKGROUND

Networked attached storage (NAS), often refers to a computing system,attached to a network, which provides file-based data storage servicesto other devices on the network. A NAS system, or NAS device, mayinclude a file system (e.g., under Microsoft Windows) that manages thedata storage services, but is generally controlled by other resourcesvia an IP address or other communication protocol. A NAS device may alsoinclude an operating system, although the operating system is oftenconfigured only to facilitate operations performed by the NAS system.Mainly, a NAS device includes one or more redundantly arranged harddisks, such as RAID arrays. A NAS device works with various file-basedand/or communication protocols, such as NFS (Network File System) forUNIX or LINUX systems, SMB/CIFS (Server Message Block/Common InternetFile System) for Windows systems, or iSCSI (Internet SCSI) for IPcommunications.

NAS devices provide a few similar functionalities to Storage AreaNetworks (SANs), although typical NAS devices only facilitate file levelstorage. Some hybrid systems exist, which provide both NAS and SANfunctionalities. However, in these hybrid systems, such as Openfiler onLINUX, the NAS device serves the SAN device at the file level, and notat a file system level, such as at the individual file level. Forexample, the assignee's U.S. Pat. No. 7,546,324, entitled Systems andMethods for Performing Storage Operations Using Network AttachedStorage, describes how individual files in a NAS device can be writtento secondary storage, and are replaced in the NAS device with a stubhaving a pointer to the secondary storage location where the file nowresides.

A NAS device may provide centralized storage to client computers on anetwork, but may also assist in load balancing and fault tolerance forresources such as email and/or web server systems. Additionally, NASdevices are generally smaller and easy to install to a network.

NAS device performance generally depends on traffic and the speed of thetraffic on the attached network, as well as the capacity of a cachememory on the NAS device. Because a NAS device supports multipleprotocols and contains reduced processing and operating systems, itsperformance may suffer when many users or many operations attempt toutilize the NAS device. The contained hardware intrinsically limits atypical NAS device, because it is self-contained and self-supported. Forexample, the capacity of its local memory may limit a typical NASdevice's ability to provide data storage to a network, among otherproblems.

The need exists for a system that overcomes the above problems, as wellas one that provides additional benefits. Overall, the examples hereinof some prior or related systems and their associated limitations areintended to be illustrative and not exclusive. Other limitations ofexisting or prior systems will become apparent to those of skill in theart upon reading the following Detailed Description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating components of a data streamutilized by a suitable data storage system.

FIG. 2 is a block diagram illustrating an example of a data storagesystem.

FIG. 3 is a block diagram illustrating an example of components of aserver used in data storage operations.

FIG. 4 is a block diagram illustrating a NAS device within a networkedcomputing system.

FIG. 5 is a block diagram illustrating the components of a NAS deviceconfigured to perform data migration.

FIGS. 6A and 6B are schematic diagrams illustrating a data store beforeand after a block-based data migration, respectively.

FIG. 7 is a flow diagram illustrating a routine for performingblock-level data migration in a NAS device.

FIG. 8 is a flow diagram illustrating a routine for performingchunk-level data migration in a NAS device

FIG. 9 is flow diagram illustrating a routine for block-based orchunk-based data restoration and modification via a NAS device.

DETAILED DESCRIPTION Overview

Described in detail herein is a system and method that transfers ormigrates data objects within a stand-alone network storage device, suchas a filer or network-attached storage (NAS) device. In some examples, aNAS device transfers segments, portions, increments, or proper subsetsof data objects stored in local memory of the NAS device. The NAS devicemay transfer portions of files, folders, and other data objects from acache to secondary storage based on certain criteria, such as time-basedcriteria, age-based criteria, and so on. A portion may be one or moreblocks of a data object, or one or more chunks of a data object, orother data portions that combine to form, store, and/or contain a dataobject, such as a file.

In some examples, the NAS device performs block-based migration of data.A data migration component within the NAS device identifies one or moreblocks of a data object stored in a cache or data storage that satisfy acertain criteria, and migrates the identified blocks. For example, thedata migration component may determine that a certain number of blocksof a file have not been modified or called by a file system within acertain time period, and migrate these blocks to secondary storage. Thedata migration component then maintains the other blocks of the file inprimary storage. In some cases, the data migration componentautomatically migrates data without requiring user input. Additionally,the migration may be transparent to a user.

In some examples, the NAS device performs chunk-based migration of data.A chunk is, for example, a group or set of blocks. One or more chunksmay comprise a portion of a file, folder, or other data object. The datamigration component identifies one or more chunks of a data object thatsatisfy a certain criteria, and migrates the identified chunks. Forexample, the data migration component may determine that a certainnumber of chunks of a file have not been modified or called by a filesystem in a certain time period, and migrate these chunks to secondarystorage. The system then maintains the other chunks of the file in thecache or data storage of the NAS device.

Network-attached storage, such as a filer or NAS device, and associateddata migration components and processes, will now be described withrespect to various examples. The following description provides specificdetails for a thorough understanding of, and enabling description for,these examples of the system. However, one skilled in the art willunderstand that the system may be practiced without these details. Inother instances, well-known structures and functions have not been shownor described in detail to avoid unnecessarily obscuring the descriptionof the examples of the system.

The terminology used in the description presented below is intended tobe interpreted in its broadest reasonable manner, even though it isbeing used in conjunction with a detailed description of certainspecific examples of the system. Certain terms may even be emphasizedbelow; however, any terminology intended to be interpreted in anyrestricted manner will be overtly and specifically defined as such inthis Detailed Description section. A suitable data storage system willfirst be described, followed by a description of suitable stand-alonedevices. Following that, various data migration and data recoveryprocesses will be discussed.

Suitable System

Referring to FIG. 1, a block diagram illustrating components of a datastream utilized by a suitable data storage system, such as a system thatperforms network attached storage, is shown. The stream 110 may includea client 111, a media agent 112, and a secondary storage device 113. Forexample, in storage operations, the system may store, receive and/orprepare data, such as blocks or chunks, to be stored, copied or backedup at a server or client 111. The system may then transfer the data tobe stored to media agent 112, which may then refer to storage policies,schedule policies, and/retention policies (and other policies) to choosea secondary storage device 113, such as a NAS device that receives dataand transfers data to attached secondary storage devices. The mediaagent 112 may include or be associated with a NAS device, to bediscussed herein.

The secondary storage device 113 receives the data from the media agent112 and stores the data as a secondary copy, such as a backup copy.Secondary storage devices may be magnetic tapes, optical disks, USB andother similar media, disk and tape drives, and so on. Of course, thesystem may employ other configurations of stream components not shown inthe Figure.

Referring to FIG. 2, a block diagram illustrating an example of a datastorage system 200 is shown. Data storage systems may contain some orall of the following components, depending on the needs of the system.FIG. 2 and the following discussion provide a brief, general descriptionof a suitable computing environment in which the system can beimplemented. Although not required, aspects of the system are describedin the general context of computer-executable instructions, such asroutines executed by a general-purpose computer, e.g., a servercomputer, wireless device or personal computer. Those skilled in therelevant art will appreciate that the system can be practiced with othercommunications, data processing, or computer system configurations,including: Internet appliances, network PCs, mini-computers, mainframecomputers, and the like. Indeed, the terms “computer,” “host,” and “hostcomputer” are generally used interchangeably herein, and refer to any ofthe above devices and systems, as well as any data processor.

Aspects of the system can be embodied in a special purpose computer ordata processor that is specifically programmed, configured, orconstructed to perform one or more of the computer-executableinstructions explained in detail herein. Aspects of the system can alsobe practiced in distributed computing environments where tasks ormodules are performed by remote processing devices, which are linkedthrough a communications network, such as a Local Area Network (LAN),Wide Area Network (WAN), Storage Area Network (SAN), Fibre Channel, orthe Internet. In a distributed computing environment, program modulesmay be located in both local and remote memory storage devices.

Aspects of the system may be stored or distributed on computer-readablemedia, including tangible storage media, such as magnetically oroptically readable computer discs, hard-wired or preprogrammed chips(e.g., EEPROM semiconductor chips), nanotechnology memory, biologicalmemory, or other data storage media. Alternatively, computer implementedinstructions, data structures, screen displays, and other data underaspects of the system may be distributed over the Internet or over othernetworks (including wireless networks), on a propagated signal on apropagation medium (e.g., an electromagnetic wave(s), a sound wave,etc.) over a period of time, or they may be provided on any analog ordigital network (packet switched, circuit switched, or other scheme).Those skilled in the relevant art will recognize that portions of thesystem reside on a server computer, while corresponding portions resideon a client computer, and thus, while certain hardware platforms aredescribed herein, aspects of the system are equally applicable to nodeson a network.

For example, the data storage system 200 contains a storage manager 210,one or more clients 111, one or more media agents 112, and one or morestorage devices 113. Storage manager 210 controls media agents 112,which may be responsible for transferring data to storage devices 113.Storage manager 210 includes a jobs agent 211, a management agent 212, adatabase 213, and/or an interface module 214. Storage manager 210communicates with client(s) 111. One or more clients 111 may access datato be stored by the system from database 222 via a data agent 221. Thesystem uses media agents 112, which contain databases 231, to transferand store data into storage devices 113. The storage devices 113 mayinclude network attached storage, such as the NAS devices describedherein. Client databases 222 may contain data files and otherinformation, while media agent databases may contain indices and otherdata structures that assist and implement the storage of data intosecondary storage devices, for example.

The data storage and recovery system may include software and/orhardware components and modules used in data storage operations. Thecomponents may be storage resources that function to copy data duringstorage operations. The components may perform other storage operations(or storage management operations) other that operations used in datastores. For example, some resources may create, store, retrieve, and/ormigrate primary or secondary data copies of data. Additionally, someresources may create indices and other tables relied upon by the datastorage system and other data recovery systems. The secondary copies mayinclude snapshot copies and associated indices, but may also includeother backup copies such as HSM copies, archive copies, auxiliarycopies, and so on. The resources may also perform storage managementfunctions that may communicate information to higher level components,such as global management resources.

In some examples, the system performs storage operations based onstorage policies, as mentioned above. For example, a storage policyincludes a set of preferences or other criteria to be considered duringstorage operations. The storage policy may determine or define a storagelocation and/or set of preferences about how the system transfers datato the location and what processes the system performs on the databefore, during, or after the data transfer. In some cases, a storagepolicy may define a logical bucket in which to transfer, store or copydata from a source to a data store, such as storage media. Storagepolicies may be stored in storage manager 210, or may be stored in otherresources, such as a global manager, a media agent, and so on. Furtherdetails regarding storage management and resources for storagemanagement will now be discussed.

Referring to FIG. 3, a block diagram illustrating an example ofcomponents of a server used in data storage operations is shown. Aserver, such as storage manager 210, may communicate with clients 111 todetermine data to be copied to storage media. As described above, thestorage manager 210 may contain a jobs agent 211, a management agent212, a database 213, and/or an interface module. Jobs agent 211 maymanage and control the scheduling of jobs (such as copying data files)from clients 111 to media agents 112. Management agent 212 may controlthe overall functionality and processes of the data storage system, ormay communicate with global managers. Database 213 or another datastructure may store storage policies, schedule policies, retentionpolicies, or other information, such as historical storage statistics,storage trend statistics, and so on. Interface module 215 may interactwith a user interface, enabling the system to present information toadministrators and receive feedback or other input from theadministrators or with other components of the system (such as viaAPIs).

Suitable Storage Devices

Referring to FIG. 4, a block diagram illustrating components of anetworked data storage device, such as a filer or NAS device 440,configured to perform data migration within a networked computing systemis shown. (While the examples below discuss a NAS device, anyarchitecture or networked data storage device employing the followingprinciples may be used, including a proxy computer coupled to the NASdevice). The computing system 400 includes a data storage system 410,such as the tiered data storage system 200. Client computers 420,including computers 422 and 424, are associated with users that generatedata to be stored in secondary storage. The client computers 422 and 424communicate with the data storage system 410 over a network 430, such asa private network such as an Intranet, a public network such as theInternet, and so on. The networked computing system 400 includes networkattached storage, such as NAS device 440. The NAS device 440 includesNAS-based storage or memory, such as a cache 444, for storing datareceived from the network, such as data from client computers 422 and424. (The term “cache” is used generically herein for any type ofstorage, and thus the cache 444 can include any type of storage forstoring of data files within the NAS device, such as magnetic disk,optical disk, semiconductor memory, or other known types of storage suchas magnetic tape or types of storage hereafter developed.) The cache 444may include an index or other data structure in order to track wheredata is eventually stored or the index may be stored elsewhere, such ason the proxy computer. The index may include information associating thedata with information identifying a secondary storage device that storedthe data, or other information. For example, as described in detailbelow, the index may include both an indication of which blocks havebeen written to secondary storage (and where they are stored insecondary storage), and a look up table that maps blocks to individualfiles stored within the NAS and NAS device 440.

The NAS device 440 also includes a data migration component 442 thatperforms data migration on data stored in the cache 444. While shown inFIG. 4 as being within the NAS device 440, the data migration component442 may be on a proxy computer coupled to the NAS device. In some cases,the data migration component 442 is a device driver or agent thatperforms block-level data migration of data stored in the cache. In somecases, the data migration component 442 performs chunk-based datamigration of data stored in the cache. Additionally, in some cases thedata migration component 442 may perform file-based data migration, or acombination of two or more types of data migration, depending on theneeds of the system. During data migration, the NAS device transfersdata from the cache of the device to one or more secondary storagedevices 450 located on the network 430, such as magnetic tapes 452,optical disks 454, or other secondary storage 456. The NAS device mayinclude various data storage components when identifying andtransferring data from the cache 444 to the secondary storage devices450. These components will now be discussed.

Referring to FIG. 5, a block diagram illustrating the components of aNAS device 440 configured to perform data migration is shown. Inaddition to a data migration component 442 and cache 444, the NAS device440 may include an input component 510, a data reception component 520,a file system 530, and an operating system 540. The input component 510may receive various inputs, such as via an iSCSI protocol. That is, theNAS device may receive commands or control data from a data storagesystem 410 over IP channels. For example, the data storage system 410may send commands to a NAS device's IP address in order to provideinstructions to the NAS device. The data reception component 520 mayreceive data to be stored over multiple protocols, such as NFS, CIFS,and so on. For example, a UNIX based system may send data to be storedto the NAS device over a NFS communication channel, while a Windowsbased system may send data to be stored to the NAS device over a CIFScommunication channel.

Additionally, the NAS device 440 may include a number of data storageresources, such as a data storage engine 560 to direct reads from writesto the data store 444, and one or more media agents 570. The mediaagents 570 may be similar to the media agents 112 described herein. Insome cases, the NAS device 440 may include two or more media agents 570,such as multiple media agents 570 externally attached to the NAS device440. The NAS device 440 may expand its data storage capabilities byadding media agents 570, as well as other components.

As discussed herein, the NAS device 440 includes a data migrationcomponent capable of transferring some or all of the data stored in thecache 442. In some examples, the data migration component 442 requestsand/or receives information from a callback layer 550, or otherintermediate component, within the NAS device 440. Briefly, the callbacklayer 550 intercepts calls for data between the file system 530 and thecache 444, and tracks these calls to provide information to the datamigration component 442 regarding when data is changed, updated, and/oraccessed by the file system 530. Further details regarding the callbacklayer 550 and other intermediate components will now discussed.

In some examples, the NAS device monitors the transfer of data from thefile system 530 to the cache 444 via the callback layer 550. Thecallback layer 550 not only facilitates the migration of data portionsfrom data storage on the NAS device to secondary storage, but alsofacilitates read back or callback of that data from the secondarystorage back to the NAS device. While described at times herein as adevice driver or agent, the callback layer 550 may be a layer, oradditional file system, that resides on top of the file system 530. Thecallback layer 550 may intercept data requests from the file system 530,in order to identify, track and/or monitor data requested by the filesystem 530 and store information associated with these requests in adata structure, such as a bitmap similar to the one shown in Table 1.Thus, the callback layer stores information identifying when a dataportion is accessed by tracking calls from the file system 530 to thecache 530. For example, Table 1 provides entry information that trackscalls to a data store:

TABLE 1 Chunk of File1 Access Time File1.1 09.05.2008 @12:00 File1.209.05.2008 @12:30 File1.3 09.05.2008 @13:30 File1.4 06.04.2008 @12:30

In this example, the file system 530 creates a data object named“File1,” using a chunking component (described herein) to divide thefile into four chunks: “File1.1,” “File1.2,” “File1.3,” and “File1.4.”The file system 530 stores the four chunks to the cache 444 on Jun. 4,2008. According to the table, the file system can determine that it hasnot accessed chunk File1.4 since its creation, and most recentlyaccessed the other chunks on Sep. 5, 2008. Of course, Table 1 mayinclude additional, other or different information, such as informationidentifying a location of the chunks, information identifying the typeof media storing the chunks, information identifying the blocks withinthe chunk, and/or other information or metadata.

Thus, providing data migration to the NAS device enables the device tofacilitate inexpensive, transparent storage to a networked computingsystem, to free up storage space by migrating or archiving stale data toother locations, among other benefits. Of course, non-networkedcomputing systems may also store data to the NAS devices describedherein. Because the NAS devices described herein can be easily andquickly installed on networks, they provide users, such as networkadministrators, with a quick and efficient way to expand their storagecapacity without incurring the typical costs associated with typical NASdevices that do not perform data migration.

For example, adding a NAS device described herein to an existingnetworked computing system can provide the computing system withexpanded storage capabilities, but can also provide the computing systemwith other data storage functionality. In some examples, the NAS devicedescribed herein includes a data storage engine (e.g., a commontechnology engine, or CTE, provided by Commvault Systems, Inc. ofOceanport, N.J.), the NAS device may act as a backup server. Forexample, such a device may perform various data storage functionsnormally provided by a backup server, such as single instancing, dataclassification, mirroring, content indexing, data backup, encryption,compression, and so on. Thus, in some examples, the NAS device describedherein acts as a fully functional and independent device anadministrator can attach to a network to perform virtually any datastorage function.

Also, in some cases, the NAS device described herein may act to performfault tolerance in a data storage system. For example, the clustering ofNAS devices on a system may provide a higher level of security, becauseprocesses on one device can be replicated on another. Thus, attachingtwo or more of the NAS devices described herein may provide anadministrator with the redundancy or security required in some datastorage systems.

Data Migration in Storage Devices

As described herein, in some examples, the NAS device leveragesblock-level or chunk-based data migration in order to provide expandedstorage capabilities to a networked computing system.

Block-level migration, or block-based data migration, involves migratingdisk blocks from the data store or cache 444 to secondary media, such asstorage devices 550. Using block-level migration, the NAS device 440transfers blocks from the cache that have not been recently accessed tosecondary storage, freeing up space on the cache.

As described above, the system can transfer or migrate certain blocks ofa data object from one data store to another, such as from a cache in aNAS device to secondary storage. Referring to FIGS. 6A-6B, a schematicdiagram illustrating contents of two data stores before and after ablock-based data migration is shown. In FIG. 6A, a first data store 610contains primary copies (i.e., production copies) of two data objects, afirst data object 620 and a second data object 630. The first dataobject comprises blocks A and A¹, where blocks A are blocks that satisfyor meet certain storage criteria (such as blocks that have not beenmodified since creation or not been modified within a certain period oftime) and blocks A′ are blocks that do not meet the criteria (such asblocks that have been modified within the certain time period). Thesecond data object comprises blocks B and B′, where blocks B satisfy thecriteria and blocks B′ do not meet the criteria.

FIG. 6B depicts the first data store 610 after a block-based datamigration of the two data objects 620 and 630. In this example, thesystem only transfers the data from blocks that satisfy a criteria(blocks A and B) from the first data store 610 to a second data store640, such as secondary storage 642, 644. The secondary storage mayinclude one or more magnetic tapes, one or more optical disks, and soon. The system maintains data in the remaining blocks (blocks A′ and B′)within the first data store 610.

The system can perform file system data migration at a block level,unlike previous systems that only migrate data at the file level (thatis, they have a file-level granularity). By tracking migrated blocks,the system can also restore data at the block level, which may avoidcost and time problems associated with restoring data at the file level.

Referring to FIG. 7, a flow diagram illustrating a routine 700 forperforming block-level data migration in a NAS device is shown. In step710, the NAS device, via the data migration component 442, identifiesdata blocks within a cache that satisfy a certain criteria. The datamigration component 442 may compare some or all of the blocks (or,information associated with the blocks) in the cache with predeterminedcriteria. The predetermined criteria may be time-based criteria within astorage policy or data retention policy.

In some examples, the data migration component 442 identifies blocks setto be “aged off” from the cache. That is, the data migration component442 identifies blocks created, changed, or last modified before acertain date and time. For example, the system may review a cache forall data blocks that satisfy a criterion or criteria. The data store maybe an electronic mailbox or personal folders (.pst) file for a MicrosoftExchange user, and the criterion may define, for example, all blocks oremails last modified or changed thirty days ago or earlier. Thecomponent 442 compares information associated with the blocks, such asmetadata associated with the blocks, to the criteria, and identifies allblocks that satisfy the criteria. For example, the component 442identifies all blocks in the .pst file not modified within the pastthirty days. The identified blocks may include all the blocks for someemails and/or a portion of the blocks for other emails. That is, for agiven email (or data object), a first portion of the blocks that includethe email may satisfy the criteria, while a second portion of the blocksthat include the same email may not satisfy the criteria. In otherwords, a file or a data object can be divided into parts or portions,and only some of the parts or portions change.

To determine which blocks have changed, and when, the NAS device canmonitor the activity of a NAS device's file system 530 via the callbacklayer 550. The NAS device may store a data structure, such as a bitmap,table, log, and so on within the cache 444 or other memory in the NASdevice or elsewhere, and update the data structure whenever the filesystem calls the cache 444 to access and update or change data blockswithin the cache 444. The callback layer 550 traps commands to the cache444, where that command identifies certain blocks on a disk for accessor modifications, and writes to the data structure the changed blocksand the time of the change. The data structure may include informationsuch as an identification of changed blocks and a date and a time theblocks were changed. The data structure, which may be a table, bitmap,or group of pointers, such as a snapshot, may also include otherinformation, such as information that maps file names to blocks,information that maps chunks to blocks and/or file names, and so on, andidentify when accesses/changes were made. Table 2 provides entryinformation for tracking the activity of a file system with the “/users”directory:

TABLE 2 Blocks Date and Time Modified /users/blocks1-100 09.08.2008@14:30 /users/blocks101-105 09.04.2008 @12:23 /users2/blocks106-11009.04.2008 @11:34 /users3/blocks110-1000 08.05.2008 @10:34

Thus, if a storage policy identified the time Aug. 30, 2008 @12:00 as athreshold time criteria, where data modified after the time is to beretained, the system would identify, in step 710, blocks 110-1000 ashaving satisfied the criteria. Thus, the system, via the intermediatecomponent 420, can monitor what blocks are requested by a file system,and act accordingly, as described herein.

In step 720, the NAS device transfers data within the identified blocksfrom the cache to a media agent 570, to be stored in a different datastore. The system may perform some or all of the processes describedwith respect to FIGS. 1-3 when transferring the data to the media agent.For example, before transferring data, the system may review a storagepolicy as described herein to select a media agent, such as media agent112, based on instructions within the storage policy. In step 725, thesystem optionally updates an allocation table, such as a file allocationtable (FAT) for the file system 530 associated with the NAS device, toindicate the data blocks that no longer contain data and are now free toreceive and store data from the file system.

In step 730, via the media agent 570, the NAS device 440 stores datafrom the blocks to a different data store. In some cases, the NASdevice, via the media agent 570, stores the data from the blocks to asecondary storage device, such as a magnetic tape 452 or optical disk454. For example, the NAS device may store the data from the blocks insecondary copies of the data store, such as a backup copy, an archivecopy, and so on.

The NAS device may create, generate, update, and/or include anallocation table, (such as a table for the data store) that tracks thetransferred data and the data that was not transferred. The table mayinclude information identifying the original data blocks for the data,the name of the data object (e.g., file name), the location of anytransferred data blocks (including, e.g., offset information), and soon. For example, Table 3 provides entry information for an example .pstfile:

TABLE 3 Name of Data Object Location of data Email1 C:/users/blocks1-100Email2.1 (body of email) C:/users/blocks101-120 Email2.2 (attachment)X:/remov1/blocks1-250 Email3 X:/remov2/blocks300-500

In the above example, the data for “Email2” is stored in two locations,the cache (C:/) and an off-site data store (X:/). The system maintainsthe body of the email, recently modified or accessed, at a locationwithin a data store associated with a file system,“C:/users/blocks101-120.” The system stores the attachment, not recentlymodified or accessed, in a separate data store, “X:/remov1/blocks1-250.”Of course, the table may include other information, fields, or entriesnot shown. For example, when the system stored data to tape, the tablemay include tape identification information, tape offset information,and so on.

Chunked file migration, or chunk-based data migration, involvessplitting a data object into two or more portions of the data object,creating an index that tracks the portions, and storing the data objectto secondary storage via the two or more portions. Among other things,the chunk-based migration provides for fast and efficient storage of adata object. Additionally, chunk-based migration facilitates fast andefficient recall of a data object, such as the large files describedherein. For example, if a user modifies a migrated file, chunk-basedmigration enables a data restore component to only retrieve and migrateback to secondary storage the chunk containing the modified portion ofthe file, and not the entire file.

As described above, in some examples the NAS device migrates chunks ofdata (sets of blocks) that comprise a data object from the cache 444 toanother. A data object, such as a file, may comprise two or more chunks.A chunk may be a logical division of a data object. For example, a .pstfile may include two or more chucks: a first chunk that storesassociated with an index of a user's mailbox, and one or more chunksthat stores email, attachments, and so on within the user's mailbox. Achunk is a proper subset of all the blocks that contain a file. That is,for a file contained or defined by n blocks, the largest chunk of thefile contains at most n-1 blocks.

In some cases, the data migration component 442 may include a chunkingcomponent that divides data objects into chunks. The chunking componentmay receive files to be stored in the cache 444, divide the files intotwo or more chunks, and store the files as two or more chunks in thecache. The chunking component may update an index that associatedinformation associated with files with the chunks of the file, the datablocks of the chunks, and so on.

The chunking component may perform different processes when determininghow to divide a data object. For example, the chunking component mayinclude indexing, header, and other identifying information or metadatain a first chunk, and include the payload in other chunks. The chunkingcomponent may identify and/or retrieve file format or schema informationfrom an index, FAT, NFS, or other allocation table in the file system todetermine where certain chunks of a data object reside (such as thefirst or last chunk of a large file). The chunking component may followa rules-based process when dividing a data object. The rules may definea minimum or maximum data size for a chunk, a time of creation for datawithin a chunk, a type of data within a chunk, and so on.

For example, the chunking component may divide a user mailbox (such as a.pst file) into a number of chunks, based on various rules that assignemails within the mailbox to chunks based on the metadata associatedwith the emails. The chunking component may place an index of themailbox in a first chunk and the emails in other chunks. The chunkingcomponent may then divide the other chunks based on dates of creation,deletion or reception of the emails, size of the emails, sender of theemails, type of emails, and so on. Thus, as an example, the chunkingcomponent may divide a mailbox as follows:

User1/Chunk1 Index User1/Chunk2 Sent emails User1/Chunk3 Received emailsUser1/Chunk4 Deleted emails User1/Chunk5 All Attachments.Of course, other divisions are possible. Chunks may not necessarily fallwithin logical divisions. For example, the chunking component may dividea data object based on information or instructions not associated withthe data object, such as information about data storage resources,information about a target secondary storage device, historicalinformation about previous divisions, and so on.

Referring to FIG. 8, a flow diagram illustrating a routine 800 forperforming chunk-level data migration in a NAS device is shown. In step810, the system identifies chunks of data blocks within a data storethat satisfy one or more criteria. The data store may store large files(>50 MB), such as databases associated with a file system, SQLdatabases, Microsoft Exchange mailboxes, virtual machine files, and soon. The system may compare some or all of the chunks (or, informationassociated with the chunks) of the data store with predetermined and/ordynamic criteria. The predetermined criteria may be time-based criteriawithin a storage policy or data retention policy. The system may reviewan index with the chunking component 815 when comparing the chunks withapplicable criteria.

In step 820, the NAS device transfers data within the identified chunksfrom the data store to a media agent, to be stored in a different datastore. The NAS device may perform some or all of the processes describedwith respect to FIGS. 1-3 when transferring the data to the media agent.For example, the NAS device may review a storage policy assigned to thedata store and select a media agent based on instructions within thestorage policy. In step 825, the system optionally updates an allocationtable, such as a file allocation table (FAT) for a file systemassociated with the NAS device, to indicate the data blocks that nolonger contain data and are now free to receive and store data from thefile system.

In step 930, via one or more media agents 570, the NAS device 440 storesthe data from the chunks to a different data store. In some cases, thesystem, via the media agent, stores the data to a secondary storagedevice, such as a magnetic tape or optical disk. For example, the systemmay store the data in secondary copies of the data store, such as abackup copy, and archive copy, and so on.

Data Recovery in Storage Devices

A data storage system, using a NAS device leveraging the block-based orchunk-based data migration processes described herein, is able torestore portions of files instead of entire files, such as individualblocks or chunks that comprise portions of the files. Referring to FIG.9, a flow diagram illustrating a routine 900 for block-based orchunk-based data restoration and modification is shown. In step 910, thesystem, via a restore or data recovery component, receives a request tomodify a file located in a cache of a NAS device or in secondary storagein communication with a NAS device. For example, a user submits arequest to a file system to provide an old copy of a large PowerPointpresentation so the user can modify a picture located on slide 5 of 300of the presentation.

In step 920, the system identifies one or more blocks or one or morechunks associated with the request. For example, the callback layer 550of the system looks to a table similar to Table 3, identifies blocksassociated with page 5 of the presentation and blocks associated with atable of contents of the presentation, and contacts a NAS device thatstored or migrated the blocks on secondary storage.

In step 930, the system, via the NAS device, retrieves the identifiedblocks or chunks from the secondary storage and presents them to theuser. For example, the system only retrieves page 5 and table ofcontents of the presentation and presents the pages to the user.

In step 940, the system receives input from a user to modify theretrieved blocks or chunks. For example, the user updates the PowerPointpresentation to include a different picture. In step 950, the systemtransfers data associated with the modified blocks or chunks back to theNAS device, where it remains in a cache or is transferred to secondarystorage. For example, the system transfers the modified page 5 to thedata store. The system may also update a table that tracks access to thedata store, such as Table 1 or Table 3.

Thus, the system, leveraging block-based or chunk-based data migrationin a NAS device, restores only portions of data objects required by afile system. Such restoration can be, among other benefits, advantageousover systems that perform file-based restoration, because those systemsrestore entire files, which can be expensive, time consuming, and so on.Some files, such as .pst files, may contain large amounts of data.File-based restoration can therefore be inconvenient and cumbersome,among other things, especially when a user only requires a small portionof a large file.

For example, a user submits a request to the system to retrieve an oldemail stored in a secondary copy on removable media via a NAS device.The system identifies a portion of a .pst file associated with the userthat contains a list of old emails in the cache of the NAS device, andretrieves the list. That is, the system has knowledge of the chunk thatincludes the list (e.g., a chunking component may always include thelist in a first chunk of a data object), accesses the chunk, andretrieves the list. The other portions (e.g., all the emails with the.pst file), were transferred from the NAS device secondary storage. Theuser selects the desired email from the list. The NAS device, via anindex in the cache that associates chunks with data or files (such as anindex similar to Table 3), identifies the chunk that contains the email,and retrieves the chunk from associate secondary storage forpresentation to the user. Thus, the NAS device is able to restore theemail without restoring the entire mailbox (.pst file) associated withthe user.

As noted above, the callback layer 550 maintains a data structure thatnot only tracks where a block or chunk resides on secondary storage, butalso which file was affected based on the migration of that block orchunk. Portions of large files may be written to secondary storage tofree up space in the data store 444 of the NAS device 440. Thus, to thenetwork, the total data storage of the NAS device is much greater thanthat actually available within the data store 444. For example, whilethe data store 444 may have only a 100 gigabyte capacity, its capacitymay actually appear as 300 gigabytes, with over 200 gigabytes migratedto secondary storage.

To help ensure sufficient space to write back data from secondarystorage to the data store 444 of the NAS device 440, the data store maybe partitioned to provide a callback or read-back cache. For example, adisk cache may be established in the data store 444 of the NAS device440 for the NAS device to write back data read from secondary storage.The amount of the partition is configurable, and may be, for example,between 5 and 20 percent of the total capacity of the data store 440. Inthe above example, with a 100 gigabyte data store 444, 10 gigabytes maybe reserved (10 percent) for data called back from secondary storage tothe NAS device 440. This disk partition or callback cache can be managedin known ways, such that data called back to this disk partition canhave the oldest data overwritten when room is needed to write new data.

Conclusion

From the foregoing, it will be appreciated that specific examples of thedata recovery system have been described herein for purposes ofillustration, but that various modifications may be made withoutdeviating from the spirit and scope of the system. For example, althoughfiles have been described, other types of content such as user settings,application data, emails, and other data objects can be imaged bysnapshots. Accordingly, the system is not limited except as by theappended claims.

Unless the context clearly requires otherwise, throughout thedescription and the claims, the words “comprise,” “comprising,” and thelike are to be construed in an inclusive sense, as opposed to anexclusive or exhaustive sense; that is to say, in the sense of“including, but not limited to.” The word “coupled”, as generally usedherein, refers to two or more elements that may be either directlyconnected, or connected by way of one or more intermediate elements.Additionally, the words “herein,” “above,” “below,” and words of similarimport, when used in this application, shall refer to this applicationas a whole and not to any particular portions of this application. Wherethe context permits, words in the above Detailed Description using thesingular or plural number may also include the plural or singular numberrespectively. The word “or” in reference to a list of two or more items,that word covers all of the following interpretations of the word: anyof the items in the list, all of the items in the list, and anycombination of the items in the list.

The above detailed description of embodiments of the system is notintended to be exhaustive or to limit the system to the precise formdisclosed above. While specific embodiments of, and examples for, thesystem are described above for illustrative purposes, various equivalentmodifications are possible within the scope of the system, as thoseskilled in the relevant art will recognize. For example, while processesor blocks are presented in a given order, alternative embodiments mayperform routines having steps, or employ systems having blocks, in adifferent order, and some processes or blocks may be deleted, moved,added, subdivided, combined, and/or modified. Each of these processes orblocks may be implemented in a variety of different ways. Also, whileprocesses or blocks are at times shown as being performed in series,these processes or blocks may instead be performed in parallel, or maybe performed at different times.

The teachings of the system provided herein can be applied to othersystems, not necessarily the system described above. The elements andacts of the various embodiments described above can be combined toprovide further embodiments.

These and other changes can be made to the system in light of the aboveDetailed Description. While the above description details certainembodiments of the system and describes the best mode contemplated, nomatter how detailed the above appears in text, the system can bepracticed in many ways. Details of the system may vary considerably inimplementation details, while still being encompassed by the systemdisclosed herein. As noted above, particular terminology used whendescribing certain features or aspects of the system should not be takento imply that the terminology is being redefined herein to be restrictedto any specific characteristics, features, or aspects of the system withwhich that terminology is associated. In general, the terms used in thefollowing claims should not be construed to limit the system to thespecific embodiments disclosed in the specification, unless the aboveDetailed Description section explicitly defines such terms. Accordingly,the actual scope of the system encompasses not only the disclosedembodiments, but also all equivalent ways of practicing or implementingthe system under the claims.

While certain aspects of the system are presented below in certain claimforms, the applicant contemplates the various aspects of the system inany number of claim forms. For example, while only one aspect of thesystem is recited as a means-plus-function claim under 35 U.S.C sec.112, sixth paragraph, other aspects may likewise be embodied as ameans-plus-function claim, or in other forms, such as being embodied ina computer-readable medium. (Any claims intended to be treated under 35U.S.C. §112, ¶6 will begin with the words “means for”.) Accordingly, theapplicant reserves the right to add additional claims after filing theapplication to pursue such additional claim forms for other aspects ofthe system.

1. A network attached storage (NAS) device, wherein the network attachedstorage device is configured to be connected to a networked computingsystem, the networked computing system including one or more secondarydata storage devices and one or more client computers connected via anetwork, the network attached storage device comprising: a housingcontaining one or more components, the components including: a datareception component, wherein the data reception component is configuredto receive for storage multiple data files from the one or more clientcomputers via the network; an operating system, wherein the operatingsystem is configured to provide a computing environment for the networkattached storage device; at least one processor, wherein the at leastone processor is programmed to perform one or more data storagefunctions for the network attached storage device; a non-volatile datastore, wherein the non-volatile data store is configured to store thedata files received from the data reception component; a file system,wherein the file system is configured to manage, for the networkattached storage device, the writing of data files to, and the readingof data files from, the non-volatile data store; one or more mediaagents, wherein the one or more media agents are configured to receiveinstructions from the at least one processor and to transfer data storedin the non-volatile data store to the one or more secondary data storagedevices, wherein the one or more secondary data storage devices areexternal to the network attached storage device but are accessible bythe one or more media agents via the network; and a data migrationcomponent, wherein the data migration component is configured toidentify portions of at least some of the data files within thenon-volatile data store and to migrate the identified data file portionsfrom the network attached storage device, wherein the identified datafile portions are for storage by the one or more media agents to atleast one of the one or more secondary storage devices, wherein the datamigration component is further configured to identify portions of aselected data file within the non-volatile data store based at least inpart on a data storage criteria, wherein the data storage criteria isassociated with the writing of data to, and the reading of data from,portions of the selected data file by the file system, wherein theidentified portions of the selected data file are less than all of theselected data file, and wherein the data migration component maintains adata structure that tracks a logical location of the identified datafile portions stored in the one or more secondary storage devices, andmaps the identified data file portions to the selected data file.
 2. Thenetwork attached storage device of claim 1, further comprising: a datainterception component, wherein the data interception component isconfigured to intercept data transferred from the data receptioncomponent to the non-volatile data store and to update an indexassociating information identifying the transferred data withinformation identifying a time of transfer to the non-volatile datastore.
 3. The network attached storage device of claim 1, furthercomprising: a data interception component, wherein the data interceptioncomponent is configured to intercept access requests from the filesystem to the non-volatile data store and to update an index associatinginformation identifying the access requests with information identifyinga time of the access requests.
 4. The network attached storage device ofclaim 1, wherein the data files consist of multiple blocks and whereinthe data migration component identifies at least one block for migrationfrom the network attached storage device, but not all of the multipleblocks, wherein the one block has an oldest access time as compared toother of the multiple blocks.
 5. The network attached storage device ofclaim 1, wherein the data migration component is configured to identifydata blocks in the non-volatile data store that have not been accessedby the file system within a predetermined time period or that have notchanged since a predetermined time.
 6. The network attached storagedevice of claim 1 wherein the non-volatile data store further comprisinga reserved read-back cache, and wherein the network attached storagedevice further comprises a callback component configured to read backthe identified data file portions stored in the one or more secondarystorage devices and to write the identified data file portions to theread-back cache.
 7. A computer-implemented method for tracking at leasta first portion of at least one data object within a network attachedstorage (NAS) device coupled to a network, wherein the NAS deviceincludes a NAS file system and a non-volatile data store, the methodcomprising: accessing calls to or from the NAS file system for readingof data from or writing of data to the non-volatile data store of theNAS device, wherein the at least one data object consists of multipledata blocks, wherein the non-volatile data store of the NAS devicestores the multiple data blocks of the at least one data object; whereinthe NAS file system of the NAS device controls the reading of data fromor the writing of data to the multiple data blocks of the at least onedata object, and wherein the accessing includes identifying individualblocks or groups of blocks within the multiple data blocks of the atleast one data object that the NAS file system of the NAS device readsdata from or writes data to; based on the accessing, identifying aportion of the multiple data blocks of the at least one data object thatsatisfies a data storage criteria, wherein the data storage criteria isa time-based criteria; and, based on the identifying, and independentlyof the NAS file system of the NAS device, updating a data structure,wherein the data structure—tracks the portion of the multiple datablocks, and provides an indication of the at least one data object towhich the portion of the multiple data blocks belongs.
 8. The method ofclaim 7, further comprising: transferring data stored in the portion ofthe multiple data blocks to a separate storage device, wherein theseparate storage device is not contained by or within the NAS device butcommunicates with the NAS device over the network, wherein the networkis a private network; updating the data structure to include informationassociating the portion of the multiple data blocks with the separatestorage device, wherein the data structure is an index is stored in thenon-volatile data store of the NAS device; and removing information froman allocation table associated with the NAS file system of the NASdevice, and wherein the at least one data object is a file, and whereinthe portion of the multiple data blocks is less than all of the multipledata blocks for the file.
 9. The method of claim 7, further comprising:transferring data stored in the portion of the multiple data blocks to aseparate storage device, wherein the separate storage device is notcontained by or within the NAS device but communicates with the NASdevice over the network; and, updating the data structure to track alocation of the portion of the multiple data blocks as being located inthe separate storage device.
 10. The method of claim 7, wherein the datastorage criteria comprises a time period in which to retain data in acache of the NAS device, or a time period in which a recent access ofthe portion must occur of the multiple data blocks.
 11. A stand-alonedata storage device, coupled to one or more external computing devicesover a network, wherein at least one external storage device is alsoconnected to the data storage device via the network, the data storagedevice comprising: at least one processor; a communication componentcoupled to the at least one processor and associated with a networkaddress for the data storage device, wherein the communication componentreceives data transfer commands from the one or more external computingdevices on the network, wherein the one or more external computingdevices direct the data transfer commands to the data storage device viathe network address for the data storage device, and wherein the datatransfer commands direct operation of the data storage device; anon-volatile, internal data store, coupled to the at least oneprocessor, wherein the internal data store stores data objects, whereinat least some of the data objects are comprised of multiple data blocks;a data storage component that comprises program code, which whenexecuted by the processor, performs data storage tasks with respect tothe internal data store; a file system that comprises program code,which when executed by the processor, stores and organizes data objectsstored in the internal data store; a call intercept layer, incommunication with the file system, that comprises program code, whichwhen executed by the processor, recognizes calls to or from the filesystem for reading of data from or writing of data to individual datablocks or groups of data blocks stored within the internal data store; adata block identification component, in communication with the callintercept layer, that comprises program code, which when executed by theprocessor, identifies data blocks of the data object that satisfy acriteria, wherein the criteria is associated with the recognized callsto or from the file system for the reading of data from or the writingof data to the individual data blocks or groups of data blocks; and anindex component that comprises program code, which when executed by theprocessor, updates an index to include information associating theidentified data blocks with information identifying the data object. 12.The data storage device of claim 11 wherein at least one data object isa file having n number of data blocks, and wherein the data storagedevice further comprises: a media agent, in communication with the datablock identification component, that comprises program code, which whenexecuted by the processor, copies or transfers, via the network and tothe external storage device, data for no more than n-1 identified datablocks; and wherein the index component includes a bitmap or a datalocation table mapping the no more than n-1 identified data blocks to alogical location on the network.
 13. The data storage device of claim 11wherein at least one data object is a file having n number of datablocks, and wherein the data storage device further comprises: a mediaagent, in communication with the data block identification component,that comprises program code, which when executed by the processor,copies or transfers, via the network and to the external storage device,data for no more than n-1 identified data blocks; and wherein the indexcomponent updates the index to include information associating thetransferred data with information identifying tape offsets for thesecondary storage device that contains the transferred data.
 14. Thedata storage device of claim 11, wherein the criteria defines a timeperiod related to when the file system last read data from or wrote datato individual data blocks or groups of data blocks.
 15. The data storagedevice of claim 11, wherein the criteria defines a time period in whichchanges were made to data contained by blocks of the data object.
 16. Amethod performed by a network attached storage device for storing aportion of a data object in a secondary storage device associated withthe network attached storage device, the method comprising: identifying,within a cache of the network attached storage device, a data objecthaving at least a first portion, wherein the first portion was lastaccessed by a file system of the network attached storage device beforea predetermined time, wherein a second portion of the data object waslast accessed by the file system after the predetermined time;transferring the first portion of the data object out of the networkattached storage and to a secondary storage device communicativelycoupled with the network attached storage device, wherein the secondarystorage device is external to the network attached storage device; andmaintaining the second portion of the data object in the cache of thenetwork attached storage device.
 17. A network attached storage device,comprising: a cache, wherein the cache stores one or more data objects;a media agent, wherein the media agent is configured to transferportions of the one or more data objects from the cache to associatedsecondary storage devices, wherein the secondary storage devices arelocated external to the network attached storage device and configuredto provide long term storage of data; and a data identificationcomponent, wherein the data identification component is configured toidentify to the media agent the portions of the data objects to betransferred to the secondary storage devices based on one or morestorage criteria.
 18. The network attached storage device of claim 17,further comprising: an intermediate component, wherein the intermediatecomponent tracks in an index all accesses of the one or more dataobjects by a file system within the network attached storage device, andwherein the data identification component identifies the portions of thedata objects based on information within the index.
 19. The networkattached storage device of claim 17, wherein the identified portions areproper subsets of data blocks of the data objects.
 20. The networkattached storage device of claim 17, wherein the identified portionsinclude chunks of the data objects created from a rule-based process ofdividing the data object into two or more portions.
 21. A system forstoring a portion of a data object in a secondary storage deviceassociated with a network attached storage (NAS) device, the systemcomprising: at least one processor; a data store; callback layer andblock identifying means for identifying, from calls to or from a filesystem within the NAS device, a first portion a data object within thedata store of the NAS device, wherein the first portion was lastaccessed by the file system of the network attached storage devicebefore a predetermined time, wherein a second portion of the data objectwas last accessed by the file system after the predetermined time; mediaagent means for transferring the first portion of the data object to asecondary storage device associated with the network attached storagedevice; and index means for tracking the second portion of the dataobject in the data store of the network attached storage device.
 22. Thesystem of claim 21, further comprising a read-back cache, formed fromthe data store, and wherein the callback layer and block identifyingmeans is further configured for reading back the first portion of thedata object stored to the secondary storage device and for writing thefirst portion to the read-back cache.